Ames Iowa Data Set- Introduction

  • This data set contains data on 2931 real estate properties around Ames Iowa and 82 columns describing the property.
  • There are many qualities of a real estate that determine its net worth.
  • In this data set, we are exploring correlation and the parameters that helps people to understand which variables are the most suitable factors while purchasing a house.
  • The data set has a lot of nominal and ordinal variables which can be converted to meaningful visualizations.
  • We are performing correlation analysis and Descriptive analytics in this Module.

Reading the data set:

 getwd()   # Understanding where the file has been saved in the system
## [1] "D:/Intermediate Analytics/Module one"
 Housing_dataset<-read.csv("AmesHousing.csv")

#Convert to dataframe

Ames_House<-as.data.frame(Housing_dataset)       #Reading the dataset into a variable

Describing the Data Set

  • Using the Summary function, we are able to understand that mean Sale Price of Iowa House is $1,80,000 tentatively, whereas the median price of the house is $1,60,000.
  • The dataset has lot of values that are not available, has been left blank or filled with zero’s and one’s in plenty.

Using Descriptive Statistics for Analysis

psych::describe(Ames_House)
                vars    n         mean           sd      median      trimmed
ï..Order           1 2930      1465.50       845.96      1465.5      1465.50
PID                2 2930 714464496.99 188730844.65 535453620.0 712423200.40
MS.SubClass        3 2930        57.39        42.64        50.0        49.76
MS.Zoning*         4 2930         5.97         0.87         6.0         6.07
Lot.Frontage       5 2440        69.22        23.37        68.0        68.35
Lot.Area           6 2930     10147.92      7880.02      9436.5      9481.05
Street*            7 2930         2.00         0.06         2.0         2.00
Alley*             8  198         1.39         0.49         1.0         1.37
Lot.Shape*         9 2930         2.94         1.41         4.0         3.05
Land.Contour*     10 2930         3.78         0.70         4.0         4.00
Utilities*        11 2930         1.00         0.06         1.0         1.00
Lot.Config*       12 2930         4.06         1.60         5.0         4.32
Land.Slope*       13 2930         1.05         0.25         1.0         1.00
Neighborhood*     14 2930        15.30         7.02        16.0        15.40
Condition.1*      15 2930         3.04         0.87         3.0         3.00
Condition.2*      16 2930         3.00         0.21         3.0         3.00
Bldg.Type*        17 2930         1.52         1.22         1.0         1.17
House.Style*      18 2930         4.02         1.91         3.0         4.01
Overall.Qual      19 2930         6.09         1.41         6.0         6.08
Overall.Cond      20 2930         5.56         1.11         5.0         5.47
Year.Built        21 2930      1971.36        30.25      1973.0      1974.25
Year.Remod.Add    22 2930      1984.27        20.86      1993.0      1985.63
Roof.Style*       23 2930         2.39         0.82         2.0         2.24
Roof.Matl*        24 2930         2.06         0.54         2.0         2.00
Exterior.1st*     25 2930        11.16         3.65        14.0        11.47
Exterior.2nd*     26 2930        11.87         4.00        15.0        12.19
Mas.Vnr.Type*     27 2930         4.43         1.08         5.0         4.46
Mas.Vnr.Area      28 2907       101.90       179.11         0.0        61.14
Exter.Qual*       29 2930         3.53         0.70         4.0         3.64
Exter.Cond*       30 2930         4.71         0.77         5.0         4.93
Foundation*       31 2930         2.39         0.73         2.0         2.45
Bsmt.Qual*        32 2851         4.69         1.31         4.0         4.85
Bsmt.Cond*        33 2851         5.80         0.70         6.0         6.00
Bsmt.Exposure*    34 2851         4.27         1.14         5.0         4.47
BsmtFin.Type.1*   35 2851         4.76         1.81         4.0         4.82
BsmtFin.SF.1      36 2929       442.63       455.59       370.0       384.08
BsmtFin.Type.2*   37 2851         6.67         1.02         7.0         6.97
BsmtFin.SF.2      38 2929        49.72       169.17         0.0         2.04
Bsmt.Unf.SF       39 2929       559.26       439.49       466.0       510.77
Total.Bsmt.SF     40 2929      1051.61       440.62       990.0      1035.05
Heating*          41 2930         2.03         0.25         2.0         2.00
Heating.QC*       42 2930         2.54         1.74         1.0         2.42
Central.Air*      43 2930         1.93         0.25         2.0         2.00
Electrical*       44 2930         5.68         1.05         6.0         6.00
X1st.Flr.SF       45 2930      1159.56       391.89      1084.0      1127.17
X2nd.Flr.SF       46 2930       335.46       428.40         0.0       272.90
Low.Qual.Fin.SF   47 2930         4.68        46.31         0.0         0.00
Gr.Liv.Area       48 2930      1499.69       505.51      1442.0      1452.25
Bsmt.Full.Bath    49 2928         0.43         0.52         0.0         0.40
Bsmt.Half.Bath    50 2928         0.06         0.25         0.0         0.00
Full.Bath         51 2930         1.57         0.55         2.0         1.56
Half.Bath         52 2930         0.38         0.50         0.0         0.34
Bedroom.AbvGr     53 2930         2.85         0.83         3.0         2.83
Kitchen.AbvGr     54 2930         1.04         0.21         1.0         1.00
Kitchen.Qual*     55 2930         3.86         1.27         5.0         4.03
TotRms.AbvGrd     56 2930         6.44         1.57         6.0         6.33
Functional*       57 2930         7.69         1.18         8.0         8.00
Fireplaces        58 2930         0.60         0.65         1.0         0.52
Fireplace.Qu*     59 1508         3.72         1.13         3.0         3.78
Garage.Type*      60 2773         3.28         1.79         2.0         3.11
Garage.Yr.Blt     61 2771      1978.13        25.53      1979.0      1980.71
Garage.Finish*    62 2773         3.18         0.82         3.0         3.23
Garage.Cars       63 2929         1.77         0.76         2.0         1.77
Garage.Area       64 2929       472.82       215.05       480.0       468.35
Garage.Qual*      65 2772         5.84         0.66         6.0         6.00
Garage.Cond*      66 2772         5.90         0.53         6.0         6.00
Paved.Drive*      67 2930         2.83         0.54         3.0         3.00
Wood.Deck.SF      68 2930        93.75       126.36         0.0        71.21
Open.Porch.SF     69 2930        47.53        67.48        27.0        33.87
Enclosed.Porch    70 2930        23.01        64.14         0.0         4.83
X3Ssn.Porch       71 2930         2.59        25.14         0.0         0.00
Screen.Porch      72 2930        16.00        56.09         0.0         0.00
Pool.Area         73 2930         2.24        35.60         0.0         0.00
Pool.QC*          74   13         2.46         1.20         3.0         2.45
Fence*            75  572         2.41         0.84         3.0         2.49
Misc.Feature*     76  106         3.85         0.55         4.0         4.00
Misc.Val          77 2930        50.64       566.34         0.0         0.00
Mo.Sold           78 2930         6.22         2.71         6.0         6.16
Yr.Sold           79 2930      2007.79         1.32      2008.0      2007.74
Sale.Type*        80 2930         9.36         1.88        10.0         9.87
Sale.Condition*   81 2930         4.78         1.08         5.0         5.00
SalePrice         82 2930    180796.06     79886.69    160000.0    170429.15
                        mad       min        max     range   skew kurtosis
ï..Order            1086.00         1       2930      2929   0.00    -1.20
PID             12373193.97 526301100 1007100110 480799010   0.06    -1.99
MS.SubClass           44.48        20        190       170   1.36     1.38
MS.Zoning*             0.00         1          7         6  -2.61     8.41
Lot.Frontage          17.79        21        313       292   1.50    11.20
Lot.Area            3024.50      1300     215245    213945  12.81   264.39
Street*                0.00         1          2         1 -15.52   239.01
Alley*                 0.00         1          2         1   0.43    -1.82
Lot.Shape*             0.00         1          4         3  -0.61    -1.60
Land.Contour*          0.00         1          4         3  -3.12     8.44
Utilities*             0.00         1          3         2  34.02  1187.96
Lot.Config*            0.00         1          5         4  -1.19    -0.44
Land.Slope*            0.00         1          3         2   4.98    26.62
Neighborhood*          8.90         1         28        27  -0.20    -1.19
Condition.1*           0.00         1          9         8   2.99    15.74
Condition.2*           0.00         1          8         7  12.08   308.97
Bldg.Type*             0.00         1          5         4   2.15     3.00
House.Style*           0.00         1          8         7   0.32    -0.95
Overall.Qual           1.48         1         10         9   0.19     0.05
Overall.Cond           0.00         1          9         8   0.57     1.48
Year.Built            37.06      1872       2010       138  -0.60    -0.50
Year.Remod.Add        20.76      1950       2010        60  -0.45    -1.34
Roof.Style*            0.00         1          6         5   1.56     0.89
Roof.Matl*             0.00         1          8         7   8.72    76.98
Exterior.1st*          1.48         1         16        15  -0.59    -0.76
Exterior.2nd*          2.97         1         17        16  -0.56    -0.90
Mas.Vnr.Type*          0.00         1          6         5  -0.69    -0.62
Mas.Vnr.Area           0.00         0       1600      1600   2.60     9.26
Exter.Qual*            0.00         1          4         3  -1.79     3.67
Exter.Cond*            0.00         1          5         4  -2.50     5.11
Foundation*            1.48         1          6         5   0.01     0.76
Bsmt.Qual*             2.97         1          6         5  -0.46    -0.82
Bsmt.Cond*             0.00         1          6         5  -3.36    10.07
Bsmt.Exposure*         0.00         1          5         4  -1.17    -0.30
BsmtFin.Type.1*        2.97         1          7         6  -0.04    -1.36
BsmtFin.SF.1         548.56         0       5644      5644   1.41     6.84
BsmtFin.Type.2*        0.00         1          7         6  -3.39    10.84
BsmtFin.SF.2           0.00         0       1526      1526   4.14    18.73
Bsmt.Unf.SF          415.13         0       2336      2336   0.92     0.40
Total.Bsmt.SF        349.89         0       6110      6110   1.16     9.11
Heating*               0.00         1          6         5  12.10   168.45
Heating.QC*            0.00         1          5         4   0.48    -1.52
Central.Air*           0.00         1          2         1  -3.47    10.01
Electrical*            0.00         1          6         5  -3.08     7.65
X1st.Flr.SF          349.89       334       5095      4761   1.47     6.95
X2nd.Flr.SF            0.00         0       2065      2065   0.87    -0.42
Low.Qual.Fin.SF        0.00         0       1064      1064  12.11   175.18
Gr.Liv.Area          461.09       334       5642      5308   1.27     4.12
Bsmt.Full.Bath         0.00         0          3         3   0.62    -0.75
Bsmt.Half.Bath         0.00         0          2         2   3.94    14.88
Full.Bath              0.00         0          4         4   0.17    -0.54
Half.Bath              0.00         0          2         2   0.70    -1.03
Bedroom.AbvGr          0.00         0          8         8   0.31     1.88
Kitchen.AbvGr          0.00         0          3         3   4.31    19.82
Kitchen.Qual*          0.00         1          5         4  -0.62    -0.68
TotRms.AbvGrd          1.48         2         15        13   0.75     1.15
Functional*            0.00         1          8         7  -3.83    13.79
Fireplaces             1.48         0          4         4   0.74     0.10
Fireplace.Qu*          1.48         1          5         4  -0.12    -1.01
Garage.Type*           0.00         1          6         5   0.75    -1.31
Garage.Yr.Blt         31.13      1895       2207       312  -0.38     1.82
Garage.Finish*         1.48         1          4         3  -0.35    -1.42
Garage.Cars            0.00         0          5         5  -0.22     0.24
Garage.Area          182.36         0       1488      1488   0.24     0.94
Garage.Qual*           0.00         1          6         5  -4.04    14.79
Garage.Cond*           0.00         1          6         5  -5.29    27.14
Paved.Drive*           0.00         1          3         2  -2.98     7.15
Wood.Deck.SF           0.00         0       1424      1424   1.84     6.73
Open.Porch.SF         40.03         0        742       742   2.53    10.92
Enclosed.Porch         0.00         0       1012      1012   4.01    28.42
X3Ssn.Porch            0.00         0        508       508  11.39   149.63
Screen.Porch           0.00         0        576       576   3.95    17.81
Pool.Area              0.00         0        800       800  16.92   299.06
Pool.QC*               1.48         1          4         3  -0.05    -1.68
Fence*                 0.00         1          4         3  -0.68    -0.89
Misc.Feature*          0.00         1          5         4  -3.16    10.37
Misc.Val               0.00         0      17000     17000  21.98   564.85
Mo.Sold                2.97         1         12        11   0.19    -0.46
Yr.Sold                1.48      2006       2010         4   0.13    -1.16
Sale.Type*             0.00         1         10         9  -3.32    10.76
Sale.Condition*        0.00         1          6         5  -2.79     7.25
SalePrice          54856.20     12789     755000    742211   1.74     5.10
                        se
ï..Order             15.63
PID             3486655.78
MS.SubClass           0.79
MS.Zoning*            0.02
Lot.Frontage          0.47
Lot.Area            145.58
Street*               0.00
Alley*                0.03
Lot.Shape*            0.03
Land.Contour*         0.01
Utilities*            0.00
Lot.Config*           0.03
Land.Slope*           0.00
Neighborhood*         0.13
Condition.1*          0.02
Condition.2*          0.00
Bldg.Type*            0.02
House.Style*          0.04
Overall.Qual          0.03
Overall.Cond          0.02
Year.Built            0.56
Year.Remod.Add        0.39
Roof.Style*           0.02
Roof.Matl*            0.01
Exterior.1st*         0.07
Exterior.2nd*         0.07
Mas.Vnr.Type*         0.02
Mas.Vnr.Area          3.32
Exter.Qual*           0.01
Exter.Cond*           0.01
Foundation*           0.01
Bsmt.Qual*            0.02
Bsmt.Cond*            0.01
Bsmt.Exposure*        0.02
BsmtFin.Type.1*       0.03
BsmtFin.SF.1          8.42
BsmtFin.Type.2*       0.02
BsmtFin.SF.2          3.13
Bsmt.Unf.SF           8.12
Total.Bsmt.SF         8.14
Heating*              0.00
Heating.QC*           0.03
Central.Air*          0.00
Electrical*           0.02
X1st.Flr.SF           7.24
X2nd.Flr.SF           7.91
Low.Qual.Fin.SF       0.86
Gr.Liv.Area           9.34
Bsmt.Full.Bath        0.01
Bsmt.Half.Bath        0.00
Full.Bath             0.01
Half.Bath             0.01
Bedroom.AbvGr         0.02
Kitchen.AbvGr         0.00
Kitchen.Qual*         0.02
TotRms.AbvGrd         0.03
Functional*           0.02
Fireplaces            0.01
Fireplace.Qu*         0.03
Garage.Type*          0.03
Garage.Yr.Blt         0.48
Garage.Finish*        0.02
Garage.Cars           0.01
Garage.Area           3.97
Garage.Qual*          0.01
Garage.Cond*          0.01
Paved.Drive*          0.01
Wood.Deck.SF          2.33
Open.Porch.SF         1.25
Enclosed.Porch        1.18
X3Ssn.Porch           0.46
Screen.Porch          1.04
Pool.Area             0.66
Pool.QC*              0.33
Fence*                0.03
Misc.Feature*         0.05
Misc.Val             10.46
Mo.Sold               0.05
Yr.Sold               0.02
Sale.Type*            0.03
Sale.Condition*       0.02
SalePrice          1475.84

Exploratory Data Analysis

Histogram

  • The data is right skewed.The Histogram indicates that the houses that are cheap are sold more often as compared to the houses that have more price.

  • The sale price feature has been log performed to avoid using negative Sales Price.
    The median Sales price of houses is around 160000. It is also shown by the line
    going across through the histogram.

 library(magrittr)
 library(ggplot2)
 Histogram_House<-Ames_House %>% ggplot(aes(x=SalePrice,fill=Sale.Condition)) + 
 geom_histogram() +
 geom_density(alpha=.2, fill="White") + geom_vline(aes(xintercept=mean(SalePrice)),
            color="black", linetype="dashed", size=1)+ ylab("Frequency")+ xlab("Price of Houses")
 Histogram_House
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Boxplot of Sales Price

  • Box plot on the sales Price shows that there are a lot of outliers in Sales Vale of the data set and it has to be cleaned.
    Ames_House %>% ggplot(aes(x=SalePrice)) +
     geom_boxplot(fill='pink')+
     theme( legend.position = "none" ) + theme_light()

Creating a Scatter Plot to understand which Neighbourhood has good land value

  • We are able to understand from the graph that Noridght and Names have Expensive Property Prices.
  • Old Town has Less expensive property prices.
  • Names has around 400 properties whereas Grn hill has 2 properties based out of that region.
table(Ames_House$Neighborhood)

Blmngtn Blueste  BrDale BrkSide ClearCr CollgCr Crawfor Edwards Gilbert  Greens 
     28      10      30     108      44     267     103     194     165       8 
GrnHill  IDOTRR Landmrk MeadowV Mitchel   NAmes NoRidge NPkVill NridgHt  NWAmes 
      2      93       1      37     114     443      71      23     166     131 
OldTown  Sawyer SawyerW Somerst StoneBr   SWISU  Timber Veenker 
    239     151     125     182      51      48      72      24 
Ames_House %>% ggplot( aes(x = SalePrice,y=Neighborhood, color=Neighborhood)) + 
geom_point(alpha=0.2) + xlab("Sales Price of House") + ylab("Neighbourhoods") +
ggtitle("Price Vs Neighbourhoods") + scale_x_log10()+ theme_light()

Boxplot- Sale Price vs Ms Zoning & House Style

  • The Box plot does a Bi-Variate Analysis where we plot the value of Sale Price vs the zone areas.
  • The areas are zoned as Residential, Agricultural, Industrial, Commercial and
    Village.
  • The graph shows that Village Property is more expensive as compared to Commercial Property.
  • Agricultural Property has the least property value.
  • The second box plot shows that the unfinished units have the lowest value.
  • All the units are priced between $1,00,000 to $3,00,000.
table(Ames_House$MS.Zoning)   #Using the table function to Understand the count of categorical data

A (agr) C (all)      FV I (all)      RH      RL      RM 
      2      25     139       2      27    2273     462 
  ggplot(Ames_House, aes(x = MS.Zoning, y = SalePrice, fill = MS.Zoning)) + 
  geom_boxplot(alpha=0.4) + theme(legend.position="top")+ ggtitle("Zone Vs Sale Price") +
  stat_summary(fun = "mean", geom = "point", shape = 6,size = 1)

  table(Ames_House$House.Style)  # Understand the count of House Styles

1.5Fin 1.5Unf 1Story 2.5Fin 2.5Unf 2Story SFoyer   SLvl 
   314     19   1481      8     24    873     83    128 
  ggplot(Ames_House, aes(x = House.Style, y = SalePrice, fill = House.Style)) + 
  geom_boxplot(alpha=0.3) + theme(legend.position="right")+ ggtitle("House Style vs Sale Price") +
  stat_summary(fun = "mean", geom = "point", shape = 1,size = 2) 

Stacked Bar Plot.

  • Major of the real estate properties have Good Basement Quality and Typical Structure.

##Analyzing Correlation

  • From the subset of Numerical and Continuous values taken from the Ames Housing data set we find that Over all Quality of the House(0.799) and Gr Living area(0.706) have a Strong Correlation with Sale Price.
  • Sale Price and Fireplace have a weak correlation with 0.558 degrees of relationship
  • Sale Price and Screen Porch has a super weak relationship(0.11)
 [1] "ï..Order"        "PID"             "MS.SubClass"     "MS.Zoning"      
 [5] "Lot.Frontage"    "Lot.Area"        "Street"          "Alley"          
 [9] "Lot.Shape"       "Land.Contour"    "Utilities"       "Lot.Config"     
[13] "Land.Slope"      "Neighborhood"    "Condition.1"     "Condition.2"    
[17] "Bldg.Type"       "House.Style"     "Overall.Qual"    "Overall.Cond"   
[21] "Year.Built"      "Year.Remod.Add"  "Roof.Style"      "Roof.Matl"      
[25] "Exterior.1st"    "Exterior.2nd"    "Mas.Vnr.Type"    "Mas.Vnr.Area"   
[29] "Exter.Qual"      "Exter.Cond"      "Foundation"      "Bsmt.Qual"      
[33] "Bsmt.Cond"       "Bsmt.Exposure"   "BsmtFin.Type.1"  "BsmtFin.SF.1"   
[37] "BsmtFin.Type.2"  "BsmtFin.SF.2"    "Bsmt.Unf.SF"     "Total.Bsmt.SF"  
[41] "Heating"         "Heating.QC"      "Central.Air"     "Electrical"     
[45] "X1st.Flr.SF"     "X2nd.Flr.SF"     "Low.Qual.Fin.SF" "Gr.Liv.Area"    
[49] "Bsmt.Full.Bath"  "Bsmt.Half.Bath"  "Full.Bath"       "Half.Bath"      
[53] "Bedroom.AbvGr"   "Kitchen.AbvGr"   "Kitchen.Qual"    "TotRms.AbvGrd"  
[57] "Functional"      "Fireplaces"      "Fireplace.Qu"    "Garage.Type"    
[61] "Garage.Yr.Blt"   "Garage.Finish"   "Garage.Cars"     "Garage.Area"    
[65] "Garage.Qual"     "Garage.Cond"     "Paved.Drive"     "Wood.Deck.SF"   
[69] "Open.Porch.SF"   "Enclosed.Porch"  "X3Ssn.Porch"     "Screen.Porch"   
[73] "Pool.Area"       "Pool.QC"         "Fence"           "Misc.Feature"   
[77] "Misc.Val"        "Mo.Sold"         "Yr.Sold"         "Sale.Type"      
[81] "Sale.Condition"  "SalePrice"      
              SalePrice Year.Built Screen.Porch Overall.Qual     Lot.Area
SalePrice     1.0000000  0.6134301   -0.6161196    0.9191334 -0.255050484
Year.Built    0.6134301  1.0000000   -0.5802653    0.7306693 -0.481776031
Screen.Porch -0.6161196 -0.5802653    1.0000000   -0.5877221 -0.268491483
Overall.Qual  0.9191334  0.7306693   -0.5877221    1.0000000 -0.433266192
Lot.Area     -0.2550505 -0.4817760   -0.2684915   -0.4332662  1.000000000
Fireplaces    0.2538283 -0.1765277   -0.2289111    0.1660880 -0.009641972
Gr.Liv.Area   0.7563544  0.1161936   -0.4819071    0.5931445 -0.010293064
               Fireplaces Gr.Liv.Area
SalePrice     0.253828277  0.75635436
Year.Built   -0.176527704  0.11619357
Screen.Porch -0.228911072 -0.48190715
Overall.Qual  0.166088038  0.59314453
Lot.Area     -0.009641972 -0.01029306
Fireplaces    1.000000000  0.38078153
Gr.Liv.Area   0.380781526  1.00000000
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

                     Description     R      p.value statistic conf.low
1        Lot Area and sale price 0.267 7.633843e-49    14.965    0.233
2 Overall Quality and Sale Price 0.799 0.000000e+00    71.964    0.786
  conf.high
1     0.300
2     0.812

Plotting Heat Map

  • The categorical value have to be converted to factors or dummy variables with nominal values before plotting graph or correlation. However we must avoid categorical values for getting the correlation.
  • Not a very clear plot, however, it sets the pace for further analysis.
# Coverting the ordinal variables to factors
Ames_House$Exter.Cond <-  as.numeric(factor(Ames_House$Exter.Cond, 
                                  levels = c("Ex", "Fa","Gd", "TA","Po"),
                                  labels = c(5,2,4,3,1) ,ordered = TRUE))

Ames_House$Heating.QC <-  as.numeric(factor(Ames_House$Heating.QC, 
                                  levels = c("Ex", "Fa","Gd", "TA","Po"),
                                  labels = c(5,2,4,3,1) ,ordered = TRUE))
Ames_House$Central.Air <- as.numeric(factor(Ames_House$Central.Air, 
                                  levels = c("N", "Y"),
                                  labels = c(0,1) ,ordered = TRUE))

model_var <- c('SalePrice', 
                'Overall.Qual','Exter.Cond','Overall.Cond','Total.Bsmt.SF','Heating.QC', 
                'Central.Air','Gr.Liv.Area','Bedroom.AbvGr','Kitchen.AbvGr',
                'TotRms.AbvGrd','Garage.Area','Open.Porch.SF','Yr.Sold')

heat <- Ames_House[,model_var]

options(repr.plot.width=8, repr.plot.height=6)

qplot(x=X1, y=X2, data=melt(cor(heat, use="p")), fill=value, geom="tile") +
   scale_fill_gradient2(low = "Blue", high = "pink", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
   name="Correlation") +
   theme_minimal()+ 
   theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1))+
   coord_fixed()+
   ggtitle("Figure 7 Correlation Heatmap") +
   theme(plot.title = element_text(hjust = 0.3))
Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
caller; using TRUE

Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
caller; using TRUE

## Heat Map for Numerical Dataset

  • This heat map shows the correlation of all numerical variables.
  • This is not a very clean map but it sets the overall sentiment as to which variable could be strongly related and which one would have weak correlation via a graphical representation.
#using the plotly function to plot the graph

library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:reshape':

    rename
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
library(reshape2)
  AmesData_Numeric_Vals=Ames_House%>%
  dplyr::select_if(is.numeric)%>%   #converting values to numeric
  drop_na()
  Analyse_Data=cor(AmesData_Numeric_Vals)             
  Analyse_melt=melt(Analyse_Data)     #Converting the value from categorical to numerical
Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
caller; using TRUE

Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
caller; using TRUE
  Correlation_Ames=ggplot(Analyse_melt,mapping=aes(x=X1,y=X2,fill=value))+
  geom_tile()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  theme(text = element_text(size=9))+
  ggtitle("Heat Map for Ames Iowa Housing Data")+
  ylab("All the factors that are numeric in nature")+
  xlab("All the factors that are numeric in nature")+
  scale_fill_distiller(palette = "BuPu")
  ggplotly(Correlation_Ames, tooltip="text")

Analysis:

  • The data contains 78 prediction variables that pertain to various aspects of an Ames home.Continuous, discrete, nominal, and ordinal types are some of these attributes. There are 44 category variables, with 21 nominal and 23 ordinal elements.

  • The Sales Price has a very strong correlation with Overall Quality of the house and Fireplaces has a weak correlation.

References: